Model complexity and information in the data: could it be a house built on sand?

نویسنده

Subhash R Lele

چکیده

Heisey et al. (2010), in an interesting paper, try to address a very difficult problem of analyzing spatially referenced, age specific prevalence data. The general goal of the analysis is to understand how force of infection changes as a function of age, time, and space. To further complicate matters, all the data considered in the paper are censored observations. Binary data are notoriously difficult to analyze, especially when latent processes are involved and prevalence is very low. Frankly, I was surprised by the complexity of the models they consider and the limited amount of information available to fit these models. I would like to congratulate them for trying to address such a difficult problem and in the process bringing to the attention of the ecologists some important statistical models in survival analysis. How does one generally deal with the conflicting issues of lack of information and desire to conduct inference about complex underlying processes? The standard approach is to compensate for lack of information by adding assumptions. This is done routinely in most statistical analyses by assuming a parametric model. For example, one can conduct inference in ANOVA without assuming any specific relationship between the treatment means if replicate observations are available at each treatment level. If such replicate data are not available, instead of giving up, we assume that there is a linear (or, some parametric) relationship between the covariates and the response, the regression approach. This is a smoothing assumption. Similarly, in one of the fundamental papers on statistical inference in the presence of nuisance parameters, Kiefer and Wolfowitz (1956) showed that simply assuming that the nuisance parameters arise from a distribution is enough of a smoothing assumption to estimate not only the parameters of interest but also the distribution function from which nuisance parameters are assumed to have arisen. Heisey et al. (2010) try to get away with the limited information available in the prevalence data, where all observations are censored, by imposing constraints on the log-hazard, a smoothing assumption of another sort. This is the easy part. The real questions are: (1) Given the limited amount of information in the data, what assumptions do we need until some inference is feasible? and (2) Are these inferences primarily driven by the data or by the assumptions? Technically, the answer to the first question is straightforward: add assumptions until the parameters in the model, at least the ones that are of scientific interest, are estimable given the data. The second question is qualitative. It is partially addressed by studying the sensitivity of the inferences on the parameters of interest (assuming they are identifiable) to the violations of the assumptions. I will discuss these issues in the remainder of the commentary. I assume readers are familiar with the basic descriptions in Heisey et al. (2010). Perhaps the easiest way out of the limited information in the prevalence data is to assume a specific parametric model for the log-hazard function. This does not guarantee that the parameters will be identifiable but it has the best chance. Heisey et al. (2010) do not take this easy way out. They aspire to assume less about the form of the log-hazard function. As they point out, the ‘‘nonparametric’’ MLE of the log-hazard is very choppy and unstable. It is generally not consistent, at least not at the usual ffiffiffi

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rereading the Bystrom and Jarvelin's Information Seeking Behavior Model: Can the Scope of this Model Be Criticized?

Background and aim: Information seeking behaviors are the reflection of users' needs that Identifying and understanding them correctly is imperative in information seeking endeavors. Experts have presented cognitive and Process user-oriented approach models to better understand scholars’ information seeking behaviors. The intent of models are to define and clarify the conditions that predict p...

متن کامل

Impossible Differential Cryptanalysis on Deoxys-BC-256

Deoxys is a final-round candidate of the CAESAR competition. Deoxys is built upon an internal tweakable block cipher Deoxys-BC, where in addition to the plaintext and key, it takes an extra non-secret input called a tweak. This paper presents the first impossible differential cryptanalysis of Deoxys-BC-256 which is used in Deoxys as an internal tweakable block cipher. First, we find a 4.5-round...

متن کامل

3D Scene and Object Classification Based on Information Complexity of Depth Data

In this paper the problem of 3D scene and object classification from depth data is addressed. In contrast to high-dimensional feature-based representation, the depth data is described in a low dimensional space. In order to remedy the curse of dimensionality problem, the depth data is described by a sparse model over a learned dictionary. Exploiting the algorithmic information theory, a new def...

متن کامل

Predicting waste generation using Bayesian model averaging

A prognosis model has been developed for solid waste generation from households in Hoi An City, a famous tourist city in Viet Nam. Waste sampling, followed by a questionnaire survey, was carried out to gather data. The Bayesian model average method was used to identify factors significantly associated with waste generation. Multivariate linear regression analysis was then applied to evaluate th...

متن کامل

Role of Kaplan’s Preference Matrix in the Assessment of Building façade, Case of Gorgan, Iran

Buildings play a key role in organization and arrangement of city appearance. Specially, their facades have profound impact on the quality of urban landscapes while playing an important role in assessing urban environments by citizens. The introduction of superior building facades in terms of popular preferences is mostly based on visual elements of building facades. Furthermore, aesthetic pref...

متن کامل

Content Analysis of Infographics with the Theme of Reading Based on the Lasswell's Communication Model

Purpose: Infographics are considered a powerful communication and information medium in human-information interaction as well as in successful transmission of messages. The aim of the present study is to arrive at a model or framework based on the content of infographics published about reading, to discover and identify their content and also to introduce the capabilities and attractiveness of ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Ecology

دوره 91 12 شماره

صفحات -

تاریخ انتشار 2010

Model complexity and information in the data: could it be a house built on sand?

نویسنده

چکیده

منابع مشابه

Rereading the Bystrom and Jarvelin's Information Seeking Behavior Model: Can the Scope of this Model Be Criticized?

Impossible Differential Cryptanalysis on Deoxys-BC-256

3D Scene and Object Classification Based on Information Complexity of Depth Data

Predicting waste generation using Bayesian model averaging

Role of Kaplan’s Preference Matrix in the Assessment of Building façade, Case of Gorgan, Iran

Content Analysis of Infographics with the Theme of Reading Based on the Lasswell's Communication Model

عنوان ژورنال:

اشتراک گذاری